Pesquisa | Portal Regional da BVS

1.

The probability of edge existence due to node degree: a baseline for network-based predictions.

Zietz, Michael; Himmelstein, Daniel S; Kloster, Kyle; Williams, Christopher; Nagle, Michael W; Greene, Casey S.

Gigascience ; 132024 Jan 02.

Artigo em Inglês | MEDLINE | ID: mdl-38323677

RESUMO

Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network's specific connections using network permutation to generate features that depend only on degree. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Researchers seeking to predict new or missing edges in biological networks should use our permutation approach to obtain a baseline for performance that may be nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).

Assuntos

Algoritmos , Probabilidade

2.

Hetnet connectivity search provides rapid insights into how two biomedical entities are related.

Himmelstein, Daniel S; Zietz, Michael; Rubinetti, Vincent; Kloster, Kyle; Heil, Benjamin J; Alquaddoomi, Faisal; Hu, Dongbo; Nicholson, David N; Hao, Yun; Sullivan, Blair D; Nagle, Michael W; Greene, Casey S.

bioRxiv ; 2023 Jan 07.

Artigo em Inglês | MEDLINE | ID: mdl-36711546

RESUMO

Hetnets, short for "heterogeneous networks", contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes - including genes, diseases, drugs, pathways, and anatomical structures - with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search . We provide an open source implementation of these methods in our new Python package named hetmatpy .

3.

The probability of edge existence due to node degree: a baseline for network-based predictions.

Zietz, Michael; Himmelstein, Daniel S; Kloster, Kyle; Williams, Christopher; Nagle, Michael W; Greene, Casey S.

bioRxiv ; 2023 Jan 06.

Artigo em Inglês | MEDLINE | ID: mdl-36711569

RESUMO

Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network's specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degree's predictive performance diminishes when the networks used for training and testing-despite measuring the same biological relationships-were generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).

4.

Unifying the identification of biomedical entities with the Bioregistry.

Hoyt, Charles Tapley; Balk, Meghan; Callahan, Tiffany J; Domingo-Fernández, Daniel; Haendel, Melissa A; Hegde, Harshad B; Himmelstein, Daniel S; Karis, Klas; Kunze, John; Lubiana, Tiago; Matentzoglu, Nicolas; McMurry, Julie; Moxon, Sierra; Mungall, Christopher J; Rutz, Adriano; Unni, Deepak R; Willighagen, Egon; Winston, Donald; Gyori, Benjamin M.

Sci Data ; 9(1): 714, 2022 11 19.

Artigo em Inglês | MEDLINE | ID: mdl-36402838

RESUMO

The standardized identification of biomedical entities is a cornerstone of interoperability, reuse, and data integration in the life sciences. Several registries have been developed to catalog resources maintaining identifiers for biomedical entities such as small molecules, proteins, cell lines, and clinical trials. However, existing registries have struggled to provide sufficient coverage and metadata standards that meet the evolving needs of modern life sciences researchers. Here, we introduce the Bioregistry, an integrative, open, community-driven metaregistry that synthesizes and substantially expands upon 23 existing registries. The Bioregistry addresses the need for a sustainable registry by leveraging public infrastructure and automation, and employing a progressive governance model centered around open code and open data to foster community contribution. The Bioregistry can be used to support the standardized annotation of data, models, ontologies, and scientific literature, thereby promoting their interoperability and reuse. The Bioregistry can be accessed through https://bioregistry.io and its source code and data are available under the MIT and CC0 Licenses at https://github.com/biopragmatics/bioregistry .

5.

Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts.

Nicholson, David N; Himmelstein, Daniel S; Greene, Casey S.

BioData Min ; 15(1): 26, 2022 Oct 18.

Artigo em Inglês | MEDLINE | ID: mdl-36258252

RESUMO

BACKGROUND: Knowledge graphs support biomedical research efforts by providing contextual information for biomedical entities, constructing networks, and supporting the interpretation of high-throughput analyses. These databases are populated via manual curation, which is challenging to scale with an exponentially rising publication rate. Data programming is a paradigm that circumvents this arduous manual process by combining databases with simple rules and heuristics written as label functions, which are programs designed to annotate textual data automatically. Unfortunately, writing a useful label function requires substantial error analysis and is a nontrivial task that takes multiple days per function. This bottleneck makes populating a knowledge graph with multiple nodes and edge types practically infeasible. Thus, we sought to accelerate the label function creation process by evaluating how label functions can be re-used across multiple edge types. RESULTS: We obtained entity-tagged abstracts and subsetted these entities to only contain compounds, genes, and disease mentions. We extracted sentences containing co-mentions of certain biomedical entities contained in a previously described knowledge graph, Hetionet v1. We trained a baseline model that used database-only label functions and then used a sampling approach to measure how well adding edge-specific or edge-mismatch label function combinations improved over our baseline. Next, we trained a discriminator model to detect sentences that indicated a biomedical relationship and then estimated the number of edge types that could be recalled and added to Hetionet v1. We found that adding edge-mismatch label functions rarely improved relationship extraction, while control edge-specific label functions did. There were two exceptions to this trend, Compound-binds-Gene and Gene-interacts-Gene, which both indicated physical relationships and showed signs of transferability. Across the scenarios tested, discriminative model performance strongly depends on generated annotations. Using the best discriminative model for each edge type, we recalled close to 30% of established edges within Hetionet v1. CONCLUSIONS: Our results show that this framework can incorporate novel edges into our source knowledge graph. However, results with label function transfer were mixed. Only label functions describing very similar edge types supported improved performance when transferred. We expect that the continued development of this strategy may provide essential building blocks to populating biomedical knowledge graphs with discoveries, ensuring that these resources include cutting-edge results.

6.

Hetnet connectivity search provides rapid insights into how biomedical entities are related.

Himmelstein, Daniel S; Zietz, Michael; Rubinetti, Vincent; Kloster, Kyle; Heil, Benjamin J; Alquaddoomi, Faisal; Hu, Dongbo; Nicholson, David N; Hao, Yun; Sullivan, Blair D; Nagle, Michael W; Greene, Casey S.

Gigascience ; 122022 12 28.

Artigo em Inglês | MEDLINE | ID: mdl-37503959

RESUMO

BACKGROUND: Hetnets, short for "heterogeneous networks," contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet, connects 11 types of nodes-including genes, diseases, drugs, pathways, and anatomical structures-with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious about not only how metformin is related to breast cancer but also how a given gene might be involved in insomnia. FINDINGS: We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any 2 nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. CONCLUSION: We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open-source implementation of these methods in our new Python package named hetmatpy.

Assuntos

Algoritmos , Probabilidade

7.

An Open-Publishing Response to the COVID-19 Infodemic.

Rando, Halie M; Boca, Simina M; McGowan, Lucy D'Agostino; Himmelstein, Daniel S; Robson, Michael P; Rubinetti, Vincent; Velazquez, Ryan; Greene, Casey S; Gitter, Anthony.

ArXiv ; 2021 Sep 17.

Artigo em Inglês | MEDLINE | ID: mdl-34545336

RESUMO

The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-to-date data from online sources nightly, regenerating some of the manuscript's figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this effort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many efforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis.

8.

Analysis of scientific society honors reveals disparities.

Le, Trang T; Himmelstein, Daniel S; Hippen, Ariel A; Gazzara, Matthew R; Greene, Casey S.

Cell Syst ; 12(9): 900-906.e5, 2021 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-34555325

RESUMO

Delivering a keynote talk at a conference organized by a scientific society or being named as a fellow by such a society indicates that a scientist is held in high regard by their colleagues. To explore if the distribution of such indicators of esteem in the field of bioinformatics reflects the composition of this field, we compared the gender, name origin, and country of affiliation of 412 honorees from the "International Society for Computational Biology" (75 fellows and 337 keynote speakers) with over 170,000 last authorships on computational biology papers between 1993 and 2019. The proportion of honors bestowed on women was similar to that of the field's overall last authorship rate. However, names of East Asian origin have been persistently underrepresented among honorees. Moreover, there were roughly twice as many honors bestowed on scientists with an affiliation in the United States as expected based on literature authorship. A record of this paper's transparent peer review process is included in the supplemental information.

Assuntos

Biologia Computacional , Sociedades Científicas , Feminino , Humanos , Estados Unidos

9.

An Open-Publishing Response to the COVID-19 Infodemic.

Rando, Halie M; Boca, Simina M; McGowan, Lucy D'Agostino; Himmelstein, Daniel S; Robson, Michael P; Rubinetti, Vincent; Velazquez, Ryan; Greene, Casey S; Gitter, Anthony.

CEUR Workshop Proc ; 2976: 29-38, 2021 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-35558551

RESUMO

The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-to-date data from online sources nightly, regenerating some of the manuscript's figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this effort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many efforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis.

10.

Is authorship sufficient for today's collaborative research? A call for contributor roles.

Vasilevsky, Nicole A; Hosseini, Mohammad; Teplitzky, Samantha; Ilik, Violeta; Mohammadi, Ehsan; Schneider, Juliane; Kern, Barbara; Colomb, Julien; Edmunds, Scott C; Gutzman, Karen; Himmelstein, Daniel S; White, Marijane; Smith, Britton; O'Keefe, Lisa; Haendel, Melissa; Holmes, Kristi L.

Account Res ; 28(1): 23-43, 2021 01.

Artigo em Inglês | MEDLINE | ID: mdl-32602379

RESUMO

Assigning authorship and recognizing contributions to scholarly works is challenging on many levels. Here we discuss ethical, social, and technical challenges to the concept of authorship that may impede the recognition of contributions to a scholarly work. Recent work in the field of authorship shows that shifting to a more inclusive contributorship approach may address these challenges. Recent efforts to enable better recognition of contributions to scholarship include the development of the Contributor Role Ontology (CRO), which extends the CRediT taxonomy and can be used in information systems for structuring contributions. We also introduce the Contributor Attribution Model (CAM), which provides a simple data model that relates the contributor to research objects via the role that they played, as well as the provenance of the information. Finally, requirements for the adoption of a contributorship-based approach are discussed.

Assuntos

Autoria , Humanos

11.

Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations.

Way, Gregory P; Zietz, Michael; Rubinetti, Vincent; Himmelstein, Daniel S; Greene, Casey S.

Genome Biol ; 21(1): 109, 2020 05 11.

Artigo em Inglês | MEDLINE | ID: mdl-32393369

RESUMO

BACKGROUND: Unsupervised compression algorithms applied to gene expression data extract latent or hidden signals representing technical and biological sources of variation. However, these algorithms require a user to select a biologically appropriate latent space dimensionality. In practice, most researchers fit a single algorithm and latent dimensionality. We sought to determine the extent by which selecting only one fit limits the biological features captured in the latent representations and, consequently, limits what can be discovered with subsequent analyses. RESULTS: We compress gene expression data from three large datasets consisting of adult normal tissue, adult cancer tissue, and pediatric cancer tissue. We train many different models across a large range of latent space dimensionalities and observe various performance differences. We identify more curated pathway gene sets significantly associated with individual dimensions in denoising autoencoder and variational autoencoder models trained using an intermediate number of latent dimensionalities. Combining compressed features across algorithms and dimensionalities captures the most pathway-associated representations. When trained with different latent dimensionalities, models learn strongly associated and generalizable biological representations including sex, neuroblastoma MYCN amplification, and cell types. Stronger signals, such as tumor type, are best captured in models trained at lower dimensionalities, while more subtle signals such as pathway activity are best identified in models trained with more latent dimensionalities. CONCLUSIONS: There is no single best latent dimensionality or compression algorithm for analyzing gene expression data. Instead, using features derived from different compression models across multiple latent space dimensionalities enhances biological representations.

Assuntos

Compressão de Dados/métodos , Expressão Gênica , Modelos Biológicos , Adulto , Criança , Humanos , Neoplasias/metabolismo , Aprendizado de Máquina Supervisionado

12.

Open collaborative writing with Manubot.

Himmelstein, Daniel S; Rubinetti, Vincent; Slochower, David R; Hu, Dongbo; Malladi, Venkat S; Greene, Casey S; Gitter, Anthony.

PLoS Comput Biol ; 15(6): e1007128, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-31233491

RESUMO

Open, collaborative research is a powerful paradigm that can immensely strengthen the scientific process by integrating broad and diverse expertise. However, traditional research and multi-author writing processes break down at scale. We present new software named Manubot, available at https://manubot.org, to address the challenges of open scholarly writing. Manubot adopts the contribution workflow used by many large-scale open source software projects to enable collaborative authoring of scholarly manuscripts. With Manubot, manuscripts are written in Markdown and stored in a Git repository to precisely track changes over time. By hosting manuscript repositories publicly, such as on GitHub, multiple authors can simultaneously propose and review changes. A cloud service automatically evaluates proposed changes to catch errors. Publication with Manubot is continuous: When a manuscript's source changes, the rendered outputs are rebuilt and republished to a web page. Manubot automates bibliographic tasks by implementing citation by identifier, where users cite persistent identifiers (e.g. DOIs, PubMed IDs, ISBNs, URLs), whose metadata is then retrieved and converted to a user-specified style. Manubot modernizes publishing to align with the ideals of open science by making it transparent, reproducible, immediate, versioned, collaborative, and free of charge.

Assuntos

Editoração , Software , Redação , Humanos , Manuscritos Médicos como Assunto

13.

Opportunities and obstacles for deep learning in biology and medicine.

Ching, Travers; Himmelstein, Daniel S; Beaulieu-Jones, Brett K; Kalinin, Alexandr A; Do, Brian T; Way, Gregory P; Ferrero, Enrico; Agapow, Paul-Michael; Zietz, Michael; Hoffman, Michael M; Xie, Wei; Rosen, Gail L; Lengerich, Benjamin J; Israeli, Johnny; Lanchantin, Jack; Woloszynek, Stephen; Carpenter, Anne E; Shrikumar, Avanti; Xu, Jinbo; Cofer, Evan M; Lavender, Christopher A; Turaga, Srinivas C; Alexandari, Amr M; Lu, Zhiyong; Harris, David J; DeCaprio, Dave; Qi, Yanjun; Kundaje, Anshul; Peng, Yifan; Wiley, Laura K; Segler, Marwin H S; Boca, Simina M; Swamidass, S Joshua; Huang, Austin; Gitter, Anthony; Greene, Casey S.

J R Soc Interface ; 15(141)2018 04.

Artigo em Inglês | MEDLINE | ID: mdl-29618526

RESUMO

Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

Assuntos

Pesquisa Biomédica/tendências , Tecnologia Biomédica/tendências , Aprendizado Profundo/tendências , Algoritmos , Pesquisa Biomédica/métodos , Tomada de Decisões , Atenção à Saúde/métodos , Atenção à Saúde/tendências , Doença/genética , Desenho de Fármacos , Registros Eletrônicos de Saúde/tendências , Humanos , Terminologia como Assunto

14.

Sci-Hub provides access to nearly all scholarly literature.

Himmelstein, Daniel S; Romero, Ariel Rodriguez; Levernier, Jacob G; Munro, Thomas Anthony; McLaughlin, Stephen Reid; Greshake Tzovaras, Bastian; Greene, Casey S.

Elife ; 72018 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-29424689

RESUMO

The website Sci-Hub enables users to download PDF versions of scholarly articles, including many articles that are paywalled at their journal's site. Sci-Hub has grown rapidly since its creation in 2011, but the extent of its coverage has been unclear. Here we report that, as of March 2017, Sci-Hub's database contains 68.9% of the 81.6 million scholarly articles registered with Crossref and 85.1% of articles published in toll access journals. We find that coverage varies by discipline and publisher, and that Sci-Hub preferentially covers popular, paywalled content. For toll access articles, we find that Sci-Hub provides greater coverage than the University of Pennsylvania, a major research university in the United States. Green open access to toll access articles via licit services, on the other hand, remains quite limited. Our interactive browser at https://greenelab.github.io/scihub allows users to explore these findings in more detail. For the first time, nearly all scholarly literature is available gratis to anyone with an Internet connection, suggesting the toll access business model may become unsustainable.

Assuntos

Acesso à Informação , Bases de Dados Bibliográficas , Comunicação Acadêmica , Bibliometria , Internet , Pennsylvania

15.

Association of HLA Genetic Risk Burden With Disease Phenotypes in Multiple Sclerosis.

Isobe, Noriko; Keshavan, Anisha; Gourraud, Pierre-Antoine; Zhu, Alyssa H; Datta, Esha; Schlaeger, Regina; Caillier, Stacy J; Santaniello, Adam; Lizée, Antoine; Himmelstein, Daniel S; Baranzini, Sergio E; Hollenbach, Jill; Cree, Bruce A C; Hauser, Stephen L; Oksenberg, Jorge R; Henry, Roland G.

JAMA Neurol ; 73(7): 795-802, 2016 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-27244296

RESUMO

IMPORTANCE: Although multiple HLA alleles associated with multiple sclerosis (MS) risk have been identified, genotype-phenotype studies in the HLA region remain scarce and inconclusive. OBJECTIVES: To investigate whether MS risk-associated HLA alleles also affect disease phenotypes. DESIGN, SETTING, AND PARTICIPANTS: A cross-sectional, case-control study comprising 652 patients with MS who had comprehensive phenotypic information and 455 individuals of European origin serving as controls was conducted at a single academic research site. Patients evaluated at the Multiple Sclerosis Center at University of California, San Francisco between July 2004 and September 2005 were invited to participate. Spinal cord imaging in the data set was acquired between July 2013 and March 2014; analysis was performed between December 2014 and December 2015. MAIN OUTCOMES AND MEASURES: Cumulative HLA genetic burden (HLAGB) calculated using the most updated MS-associated HLA alleles vs clinical and magnetic resonance imaging outcomes, including age at onset, disease severity, conversion time from clinically isolated syndrome to clinically definite MS, fractions of cortical and subcortical gray matter and cerebral white matter, brain lesion volume, spinal cord gray and white matter areas, upper cervical cord area, and the ratio of gray matter to the upper cervical cord area. Multivariate modeling was applied separately for each sex data set. RESULTS: Of the 652 patients with MS, 586 had no missing genetic data and were included in the HLAGB analysis. In these 586 patients (404 women [68.9%]; mean [SD] age at disease onset, 33.6 [9.4] years), HLAGB was higher than in controls (median [IQR], 0.7 [0-1.4] and 0 [-0.3 to 0.5], respectively; P = 1.8 × 10-27). A total of 619 (95.8%) had relapsing-onset MS and 27 (4.2%) had progressive-onset MS. No significant difference was observed between relapsing-onset MS and primary progressive MS. A higher HLAGB was associated with younger age at onset and the atrophy of subcortical gray matter fraction in women with relapsing-onset MS (standard ß = -1.20 × 10-1; P = 1.7 × 10-2 and standard ß = -1.67 × 10-1; P = 2.3 × 10-4, respectively), which were driven mainly by the HLA-DRB1*15:01 haplotype. In addition, we observed the distinct role of the HLA-A*24:02-B*07:02-DRB1*15:01 haplotype among the other common DRB1*15:01 haplotypes and a nominally protective effect of HLA-B*44:02 to the subcortical gray atrophy (standard ß = -1.28 × 10-1; P = 5.1 × 10-3 and standard ß = 9.52 × 10-2; P = 3.6 × 10-2, respectively). CONCLUSIONS AND RELEVANCE: We confirm and extend previous observations linking HLA MS susceptibility alleles with disease progression and specific clinical and magnetic resonance imaging phenotypic traits.

Assuntos

Predisposição Genética para Doença/genética , Antígenos de Histocompatibilidade Classe I/genética , Esclerose Múltipla/genética , Polimorfismo de Nucleotídeo Único/genética , Adulto , Idade de Início , Alelos , Encéfalo/diagnóstico por imagem , Encéfalo/patologia , Estudos de Casos e Controles , Estudos Transversais , Feminino , Estudos de Associação Genética , Humanos , Imageamento Tridimensional , Masculino , Pessoa de Meia-Idade , Esclerose Múltipla/diagnóstico por imagem , Esclerose Múltipla/fisiopatologia , Estudos Retrospectivos , Medula Espinal/diagnóstico por imagem , Medula Espinal/patologia , População Branca , Adulto Jovem

16.

Genetic Association-Guided Analysis of Gene Networks for the Study of Complex Traits.

Greene, Casey S; Himmelstein, Daniel S.

Circ Cardiovasc Genet ; 9(2): 179-84, 2016 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-27094199

Assuntos

Redes Reguladoras de Genes , Estudos de Associação Genética , Característica Quantitativa Herdável , Fatores de Confusão Epidemiológicos , Genômica , Humanos , Polimorfismo de Nucleotídeo Único/genética

17.

Meta-analysis of genome-wide association studies reveals genetic overlap between Hodgkin lymphoma and multiple sclerosis.

Khankhanian, Pouya; Cozen, Wendy; Himmelstein, Daniel S; Madireddy, Lohith; Din, Lennox; van den Berg, Anke; Matsushita, Takuya; Glaser, Sally L; Moré, Jayaji M; Smedby, Karin E; Baranzini, Sergio E; Mack, Thomas M; Lizée, Antoine; de Sanjosé, Silvia; Gourraud, Pierre-Antoine; Nieters, Alexandra; Hauser, Stephen L; Cocco, Pierluigi; Maynadié, Marc; Foretová, Lenka; Staines, Anthony; Delahaye-Sourdeix, Manon; Li, Dalin; Bhatia, Smita; Melbye, Mads; Onel, Kenan; Jarrett, Ruth; McKay, James D; Oksenberg, Jorge R; Hjalgrim, Henrik.

Int J Epidemiol ; 45(3): 728-40, 2016 06.

Artigo em Inglês | MEDLINE | ID: mdl-26971321

RESUMO

BACKGROUND: Based on epidemiological commonalities, multiple sclerosis (MS) and Hodgkin lymphoma (HL), two clinically distinct conditions, have long been suspected to be aetiologically related. MS and HL occur in roughly the same age groups, both are associated with Epstein-Barr virus infection and ultraviolet (UV) light exposure, and they cluster mutually in families (though not in individuals). We speculated if in addition to sharing environmental risk factors, MS and HL were also genetically related. Using data from genome-wide association studies (GWAS) of 1816 HL patients, 9772 MS patients and 25 255 controls, we therefore investigated the genetic overlap between the two diseases. METHODS: From among a common denominator of 404 K single nucleotide polymorphisms (SNPs) studied, we identified SNPs and human leukocyte antigen (HLA) alleles independently associated with both diseases. Next, we assessed the cumulative genome-wide effect of MS-associated SNPs on HL and of HL-associated SNPs on MS. To provide an interpretational frame of reference, we used data from published GWAS to create a genetic network of diseases within which we analysed proximity of HL and MS to autoimmune diseases and haematological and non-haematological malignancies. RESULTS: SNP analyses revealed genome-wide overlap between HL and MS, most prominently in the HLA region. Polygenic HL risk scores explained 4.44% of HL risk (Nagelkerke R(2)), but also 2.36% of MS risk. Conversely, polygenic MS risk scores explained 8.08% of MS risk and 1.94% of HL risk. In the genetic disease network, HL was closer to autoimmune diseases than to solid cancers. CONCLUSIONS: HL displays considerable genetic overlap with MS and other autoimmune diseases.

Assuntos

Estudo de Associação Genômica Ampla , Doença de Hodgkin/genética , Esclerose Múltipla/genética , Polimorfismo de Nucleotídeo Único , Feminino , Redes Reguladoras de Genes , Predisposição Genética para Doença , Humanos , Modelos Lineares , Masculino

18.

Erratum to: Evolving hard problems: generating human genetics datasets with a complex etiology.

Himmelstein, Daniel S; Greene, Casey S; Moore, Jason H.

BioData Min ; 9: 9, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26848312

RESUMO

[This corrects the article DOI: 10.1186/1756-0381-4-21.].

19.

Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes.

Himmelstein, Daniel S; Baranzini, Sergio E.

PLoS Comput Biol ; 11(7): e1004259, 2015 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-26158728

RESUMO

The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks--graphs with multiple node and edge types--for accomplishing both tasks. First we constructed a network with 18 node types--genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database) collections--and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains.

Assuntos

Mapeamento Cromossômico/métodos , Mineração de Dados/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Proteoma/genética , Algoritmos , Animais , Humanos , Mapeamento de Interação de Proteínas/métodos , Transdução de Sinais/genética , Integração de Sistemas

20.

Understanding multicellular function and disease with human tissue-specific networks.

Greene, Casey S; Krishnan, Arjun; Wong, Aaron K; Ricciotti, Emanuela; Zelaya, Rene A; Himmelstein, Daniel S; Zhang, Ran; Hartmann, Boris M; Zaslavsky, Elena; Sealfon, Stuart C; Chasman, Daniel I; FitzGerald, Garret A; Dolinski, Kara; Grosser, Tilo; Troyanskaya, Olga G.

Nat Genet ; 47(6): 569-76, 2015 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-25915600

RESUMO

Tissue and cell-type identity lie at the core of human physiology and disease. Understanding the genetic underpinnings of complex tissues and individual cell lineages is crucial for developing improved diagnostics and therapeutics. We present genome-wide functional interaction networks for 144 human tissues and cell types developed using a data-driven Bayesian methodology that integrates thousands of diverse experiments spanning tissue and disease states. Tissue-specific networks predict lineage-specific responses to perturbation, identify the changing functional roles of genes across tissues and illuminate relationships among diseases. We introduce NetWAS, which combines genes with nominally significant genome-wide association study (GWAS) P values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone. Our webserver, GIANT, provides an interface to human tissue networks through multi-gene queries, network visualization, analysis tools including NetWAS and downloadable networks. GIANT enables systematic exploration of the landscape of interacting genes that shape specialized cellular functions across more than a hundred human tissues and cell types.

Assuntos

Redes Reguladoras de Genes , Mapas de Interação de Proteínas , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Teorema de Bayes , Células Cultivadas , Regulação da Expressão Gênica , Ontologia Genética , Estudo de Associação Genômica Ampla , Humanos , Hipertensão/genética , Hipertensão/metabolismo , Modelos Biológicos , Miócitos de Músculo Liso/fisiologia , Especificidade de Órgãos , Doença de Parkinson/genética , Doença de Parkinson/metabolismo

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA